Team clearly described the dataset and clearly described the motivation behind studying the data. Team provided scholarly citations or quantitative facts to describe the motivation.
Team clearly described their data cleaning and outlier removal process. Team presented insightful visualizations motivating to do further exploratory or confirmatory analysis.
#PART 1: Read csv, merge, clean and plot outliers.
library(readr)
library(readxl)
library(dplyr)
library(countrycode)
library(car)
source('Read_Clean.R')
cleaned <- Read_Clean()
Team applied dimension reduction analysis correctly and discussed the motivation behind that. Also, they provided interesting insights into the results.
#PART 2: MDS
image
# PART 3: PCA
library(pryr)
library(ggbiplot) #if the library is not present use the code below
#library(devtools)
#install_github("vqv/ggbiplot")
source('PCA.R')
(PrinCompPlot <- PCA(cleaned))
## [[1]]
##
## [[2]]
FROM PCA excluded are: pop_total, murder_pp, armed_pp, urban_pop_tot, investment_per_of_GDP it is because they spoil correlation between variables and as such more PC would be needed to explain the relation.
ellipses to help visualize concentration of points. The size of ellipses is influenced by outlines.
Central America & South are in the middle on the plot
on PC1 & PC3 explain better Africa. (it is because PC3 is mostly about inequality and development)
# PART 3: Hierarchical Clustering between Continents
library(ape)
source('cluster_continents.R')
Cl_continents <- cluster_continents(cleaned)
Include all variables
South, North and Europe are very similar. AND C America, Asia, Oceania and Africa are similar. Interesting is Africa is clustered with Oceania (with include Australia and NZ but also many small island which push Oceania into level of Africa)
# PART 4: K-means & Model Based Clustering between Countries
library(mclust)
library(maptools)
source('clusters_countries.R')
Cl_countries <- clusters_countries(cleaned)
compare chi.square test -> dependency between groups and continents. Model based groups are more similar to continents.
model based (group7) difficult name (result for this group) pop_total murder_pp armed_pp phones_p100 children_p_woman life_exp_yrs suicide_pp urban_pop_tot sex_ratio_p100 [1,] 239114394 0 0.011 146.317 2.07 78.053 0 118882678 148.681 corruption_CPI internet_%of_pop child_mort_p1000 income_per_person investments_per_ofGDP gini [1,] 53.677 75.725 10.791 50579.31 29.942 39.722
Developed countries are split into 3 groups.
poor countries are the same in both models
we lost “crowded” group from k-means. It transfoms into group 7 which describe high Income, Sex Ratio, Population, phones
#PART 5: EFA
source('EFA.R')
EFA(cleaned)
##
## Loadings:
## Factor1 Factor2 Factor3 Factor4
## pop_total 0.995
## murder_pp 0.825
## armed_pp
## phones_p100 0.615
## children_p_woman -0.918
## life_exp_yrs 0.875
## suicide_pp
## urban_pop_tot 0.958
## sex_ratio_p100 0.538
## corruption_CPI 0.538
## internet_%of_pop 0.847
## child_mort_p1000 -0.940
## income_per_person 0.575 0.768
## investments_per_ofGDP
## gini 0.625
##
## Factor1 Factor2 Factor3 Factor4
## SS loadings 4.439 1.949 1.262 1.153
## Proportion Var 0.296 0.130 0.084 0.077
## Cumulative Var 0.296 0.426 0.510 0.587
## NULL
#PART 6: CFA
#???????